Left Nb. | Right Nb. | Frequency |
---|---|---|
нь | ч | 566 |
нь | энэ | 390 |
юм | бол | 509 |
юм | байна | 952 |
байна | гэж | 399 |
ч | байсан | 170 |
ч | энэ | 188 |
ч | юм | 347 |
л | гэж | 115 |
л | энэ | 86 |
л | байгаа | 137 |
л | юм | 218 |
л | байсан | 146 |
л | бол | 141 |
л | байна | 311 |
гэж | байсан | 103 |
гэж | байна | 266 |
гэж | байгаа | 308 |
байгаа | ч | 182 |
байгаа | бол | 208 |
байгаа | нь | 815 |
байгаа | юм | 1533 |
байсан | нь | 245 |
байсан | гэж | 127 |
байсан | ч | 314 |
байсан | бол | 334 |
байсан | юм | 800 |
бол | энэ | 129 |
энэ | нь | 246 |
энэ | бол | 118 |
NN co-occurrences within the 10 most frequent words are presented in a table.
The graph below gives much more information. Here, the top-1000 words are plotted against each other and the dots indicate NN co-occurrences. The diameter of the dots increases with the significance of the co-occurrence. Both axis are scaled logarithmic to shift the emphasis to the top words.
The picture above is very typical for a language, therefore the name language fingerprint. Comparing these fingerprints for different languages one is able to identify determiners, prepositions etc. by its graphical properties.
Frequency of the most frequent word:
select @maxfreq:=(select freq from words where w_id=101);
Table data:
select w1.word,w2.word,c.freq from co_n c, words w1, words w2 where w1.w_id=w1_id and w2.w_id=w2_id and w1_id>100 and w2_id>100 and 110>=w1_id and 110>=w2_id and c.freq>(select count(*) from sentences)/100000 order by w1.w_id;
Picture data:
select if(12>w1_id-99,w1.word,"-"),if(12>w2_id-99,w2.word,"-"),w1_id-99,w2_id-99,1/(log(c.freq/@maxfreq)*log(c.freq/@maxfreq)/20) from co_n c, words w1, words w2 where w1.w_id=w1_id and w2.w_id=w2_id and w1_id>100 and w2_id>100 and 1100>=w1_id and 1100>=w2_id and c.freq>(select count(*) from sentences)/100000;